Data Sampling and Dimensionality Reduction Approaches for Reranking ASR Outputs Using Discriminative Language Models

نویسندگان

Erinç Dikici

Murat Semerci

Murat Saraclar

Ethem Alpaydin

چکیده

This paper investigates various approaches to data sampling and dimensionality reduction for discriminative language models (DLM). Being a feature based language modeling approach, the aim of DLM is to rerank the ASR output with discriminatively trained feature parameters. Using a Turkish morphology based feature set, we examine the use of online Principal Component Analysis (PCA) as a dimensionality reduction method. We exploit ranking perceptron and ranking SVM as two alternative discriminative modeling techniques, and apply data sampling to improve their efficiency. We obtain a reduction in word error rate (WER) of 0.4%, significant at p < 0.001 over the baseline perceptron result.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Unsupervised training methods for discriminative language modeling

Discriminative language modeling (DLM) aims to choose the most accurate word sequence by reranking the alternatives output by the automatic speech recognizer (ASR). The conventional (supervised) way of training a DLM requires a large amount of acoustic recordings together with their manual reference transcriptions. These transcriptions are used to determine the target ranks of the ASR outputs, ...

متن کامل

Survey on Three Reranking Models for Discriminative Parsing

This survey is inspired by the so-called reranking techniques in natural language processing (NLP). The aim of this survey is to provide an overview of three main reranking tasks particularly for discriminative parsing. We will focus on the motivation for discriminative reranking, on the three models, boosting model, support vector machine (SVM) model and voted perceptron model, on the procedur...

متن کامل

Hallucinating system outputs for discriminative language modeling

Project overview • NSF funded project and recent JHU summer workshop team • General topic: discriminative language modeling for ASR and MT – Learning language models with discriminative objectives • Specific topic: learning models from text only – Enabling use of much more training data; adaptation scenarios • Have made some progress with ASR models (topic today) – Less progress on improving MT...

متن کامل

Low-Dimensional Discriminative Reranking

The accuracy of many natural language processing tasks can be improved by a reranking step, which involves selecting a single output from a list of candidate outputs generated by a baseline system. We propose a novel family of reranking algorithms based on learning separate low-dimensional embeddings of the task’s input and output spaces. This embedding is learned in such a way that prediction ...

متن کامل

Performance Comparison of Training Algorithms for Semi-Supervised Discriminative Language Modeling

Discriminative language modeling (DLM) has been shown to improve the accuracy of automatic speech recognition (ASR) systems, but it requires large amounts of both acoustic and text data for training. One way to overcome this is to use simulated hypotheses instead of real hypotheses for training, which is called semisupervised training. In this study, we compare six different perceptron algorith...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

Data Sampling and Dimensionality Reduction Approaches for Reranking ASR Outputs Using Discriminative Language Models

نویسندگان

چکیده

منابع مشابه

Unsupervised training methods for discriminative language modeling

Survey on Three Reranking Models for Discriminative Parsing

Hallucinating system outputs for discriminative language modeling

Low-Dimensional Discriminative Reranking

Performance Comparison of Training Algorithms for Semi-Supervised Discriminative Language Modeling

عنوان ژورنال:

اشتراک گذاری